On security threats for robust perceptual hashing
نویسندگان
چکیده
Perceptual hashing has to deal with the constraints of robustness, accuracy and security. After modeling the process of hash extraction and the properties involved in this process, two different security threats are studied, namely the disclosure of the secret feature space and the tampering of the hash. Two different approaches for performing robust hashing are presented: Random-Based Hash (RBH) where the security is achieved using a random projection matrix and Content-Based Hash (CBH) were the security relies on the difficulty to tamper the hash. As for digital watermarking, different security setups are also devised: the Batch Hash Attack, the Group Hash Attack, the Unique Hash Attack and the Sensitivity Attack. A theoretical analysis of the information leakage in the context of Random-Based Hash is proposed. Finally, practical attacks are presented: (1) Minor Component Analysis is used to estimate the secret projection of Random-Based Hashes and (2) Salient point tampering is used to tamper the hash of Content-Based Hashes systems. 1. DEFINITIONS FOR ROBUST PERCEPTUAL HASHING Robust perceptual hashing consists of extracting a small-dimensional vector called either a hash, a signature or a fingerprint from a high-dimensional content. This technique enables to perform authentication or identification of contents that have been referenced in a database. Robust perceptual hashing has many applications: It can be used to prevent forgery of physical contents like medicine packages, bank notes or valuable watches by checking that the hash of the content under scrutiny belongs to the database of genuine contents ; It can be used to filter digital contents, for example this technology is used on open Web2.0 services like YouTube to filter the contents that are to be uploaded into the data-base ; It can also be used in digital watermarking in order to generate content-dependent watermarks and perform content authentication via watermark detection. Moreover, robust perceptual hashing has to fulfill three constraints: 1) Robustness to distortions: refers to the ability of the hash function to produce asymptotically the same output based on inputs that differ by legitimate distortion level that can be a consequence of signal processing and/or desynchronization transformations applied to a multimedia data; 2) Security: refers to the property that the modification of the hash should not be easily tractable for an adversary; 3) Universality: refers to the performance of the hash which has to be optimal or asymptotically optimal in the case of lack of prior knowledge about the statistics of input source distribution. Additional constraints come from the facts that on one hand the hash length has to be as small as possible in order to guarantee a fast scan in the hash database, and on the other hand, as will be shown in this paper, the hash length has to be as large as possible to satisfy performance requirements. It is important to notice that the constraint of security means that either the hash relies on a secret key (cf. Kerckhoffs’ principle), or that modification of the hash is not possible. Note that while typical security attacks in watermarking are solely associated with the estimation of the secret key of the algorithm , 6 security threats Further author information: (Send correspondence to P.B) P.B: E-mail: [email protected] for perceptual hashing are numerous since the adversary can devise two different attacks: she can either try to estimate the secret key used to generate the hash or she can try to simply modify the content in order to tamper the hash if its extraction scheme is public. Robust hashes are constructed by extracting robust features from contents which are afterward quantized using either a scalar quantizer or a vector quantizer. To decide whether a content belongs to a database of contents or not, a query is performed using the hash database. It consists of extracting the hash of the queried content and looking for the most similar hash in the hash database. For example, the similarity can be computed using a Ln distance or the angle between each vector. From a geometrical point of view, generating a set of N robust distinct hashes from N distinct contents consists of extracting N feature vectors of smaller dimension. The robustness is guaranteed by the fact that the distance between each vector and its nearest neighbours is as large as possible. Our notations denote a set H = {x ∈ Rv}h of vectors/contents to keep track of with some secret key K, where Nv (resp. Nh) is the dimensionality of the input data vector (resp. the hash). Moreover, |H| = Nc. ψK(x) ∈ Rh denotes the robust hash representation of x. To draw a general picture of the context of robust hashing, we can distinguish two different ways to create robust hashes (see Fig. 1): the Random-Based Hashes (RBH) which are motivated by statistical considerations and the Content-Based Hashes (CBH)which are motivated by content analysis. Figure 1. Two different ways to extract robust hashes: content-based analysis and random feature extraction. 2. RANDOM-BASED HASHES AND CONTENT-BASED HASHES The goal of this section is to present the two classes of robust hash and to give a theoretical framework for each. 2.1 Random-Based Hashes (RBH) Random-Based hashes have been devised in order to cope with both robustness and security constraints, ideally disregarding the statistics of the input data. In a general way, RBH are generated using a secret key, the hash is built using secret projections 3, 6 and the identification or query procedure is done by evaluating the distance between the stored hashes and the one extracted from the content. It should be pointed out that besides this identification problem, the authentication problem can be handled by accepting/rejecting the content based on a single verification trial that consists of the mentioned distance evaluation between the hash extracted from the content and some piece of side information. However, we mostly concentrate on the analysis of issues relevant with the former setup (identification) but we shall give some highlights on authentication issues when needed. Building a RBH consists of projecting the content vector x on a set of random vectors to obtain a Nhdimensional projected vector ỹ and quantizing ỹ to obtain the hash vector ψK(x) (see Fig. 2). Figure 2. Different fundamental blocks for computing a Random-Based Hash. In order to ensure robustness, the whole system has to be designed in such a way that contents with different hashes must have non-overlapping identification regions. Using the Central Limit Theorem we can state that the i projection ỹi(xk) of the content Xk ∼ N (0, σ xINv) on the secret vector ai is Gaussian with law N (0, σ 2 x) provided that ||ai|| = 1. Let d denote the Euclidean distance between two hash vectors related to two distinct contents x1 and x2. We have:
منابع مشابه
Image authentication using LBP-based perceptual image hashing
Feature extraction is a main step in all perceptual image hashing schemes in which robust features will led to better results in perceptual robustness. Simplicity, discriminative power, computational efficiency and robustness to illumination changes are counted as distinguished properties of Local Binary Pattern features. In this paper, we investigate the use of local binary patterns for percep...
متن کاملA robust and secure perceptual hashing system based on a quantization step analysis
Perceptual hashing is conventionally used for content identification and authentication. It has applications in database content search, watermarking and image retrieval. Most countermeasures proposed in the literature generally focus on the feature extraction stage to get robust features to authenticate the image, but few studies address the perceptual hashing security achieved by a cryptograp...
متن کاملRobust and secure perceptual image hashing in the transform domain
The rapid development of multimedia devices such as computers, network technologies, and cell phones have made it easier for users to create, broadcast, convey, share, store, and distribute multimedia data including images, videos and audio files on a daily basis. However, the availability of image processing software in the public domain has facilitated illegal copying and distribution of digi...
متن کاملZero-watermarking Algorithm for Medical Volume Data Based on Legendre Chaotic Neural Network and Perceptual Hashing
Medical information digitization makes the medical information storage and extraction more convenient. Medical image information security and copyright protection is also gradually being taken seriously, and some medical image watermarking has been applied. According to the characteristics of three-dimensional medical images, this paper proposes a robust zero-watermarking algorithm for medical ...
متن کاملConception and limits of robust perceptual hashing: towards side information assisted hash functions
In this paper, we consider some basic concepts behind the design of existing robust perceptual hashing techniques for content identification. We show the limits of robust hashing from the communication perspectives as well as propose an approach capable to overcome these shortcomings in certain setups. The consideration is based on both achievable rate and probability of error. We use a fact th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009